Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 20433 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.7 MiB |
| Average record size in memory | 88.0 B |
Variable types
| Numeric | 10 |
|---|---|
| Categorical | 1 |
longitude is highly correlated with latitude | High correlation |
latitude is highly correlated with longitude | High correlation |
total_rooms is highly correlated with total_bedrooms and 2 other fields | High correlation |
total_bedrooms is highly correlated with total_rooms and 2 other fields | High correlation |
population is highly correlated with total_rooms and 2 other fields | High correlation |
households is highly correlated with total_rooms and 2 other fields | High correlation |
median_income is highly correlated with median_house_value | High correlation |
median_house_value is highly correlated with median_income | High correlation |
longitude is highly correlated with latitude | High correlation |
latitude is highly correlated with longitude | High correlation |
total_rooms is highly correlated with total_bedrooms and 2 other fields | High correlation |
total_bedrooms is highly correlated with total_rooms and 2 other fields | High correlation |
population is highly correlated with total_rooms and 2 other fields | High correlation |
households is highly correlated with total_rooms and 2 other fields | High correlation |
median_income is highly correlated with median_house_value | High correlation |
median_house_value is highly correlated with median_income | High correlation |
longitude is highly correlated with latitude | High correlation |
latitude is highly correlated with longitude | High correlation |
total_rooms is highly correlated with total_bedrooms and 2 other fields | High correlation |
total_bedrooms is highly correlated with total_rooms and 2 other fields | High correlation |
population is highly correlated with total_rooms and 2 other fields | High correlation |
households is highly correlated with total_rooms and 2 other fields | High correlation |
df_index is highly correlated with longitude and 3 other fields | High correlation |
longitude is highly correlated with df_index and 3 other fields | High correlation |
latitude is highly correlated with df_index and 3 other fields | High correlation |
total_rooms is highly correlated with total_bedrooms and 2 other fields | High correlation |
total_bedrooms is highly correlated with total_rooms and 2 other fields | High correlation |
population is highly correlated with total_rooms and 2 other fields | High correlation |
households is highly correlated with total_rooms and 2 other fields | High correlation |
median_income is highly correlated with median_house_value | High correlation |
median_house_value is highly correlated with df_index and 4 other fields | High correlation |
ocean_proximity is highly correlated with df_index and 3 other fields | High correlation |
df_index is uniformly distributed | Uniform |
df_index has unique values | Unique |
Reproduction
| Analysis started | 2022-03-21 14:45:02.677471 |
|---|---|
| Analysis finished | 2022-03-21 14:45:42.889439 |
| Duration | 40.21 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 20433 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10316.17609 |
| Minimum | 0 |
|---|---|
| Maximum | 20639 |
| Zeros | 1 |
| Zeros (%) | < 0.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 159.8 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1027.6 |
| Q1 | 5162 |
| median | 10319 |
| Q3 | 15473 |
| 95-th percentile | 19601.4 |
| Maximum | 20639 |
| Range | 20639 |
| Interquartile range (IQR) | 10311 |
Descriptive statistics
| Standard deviation | 5956.699278 |
|---|---|
| Coefficient of variation (CV) | 0.5774134938 |
| Kurtosis | -1.198669631 |
| Mean | 10316.17609 |
| Median Absolute Deviation (MAD) | 5156 |
| Skewness | -0.0007732714102 |
| Sum | 210790426 |
| Variance | 35482266.29 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1 | < 0.1% |
| 13749 | 1 | < 0.1% |
| 13756 | 1 | < 0.1% |
| 13755 | 1 | < 0.1% |
| 13754 | 1 | < 0.1% |
| 13753 | 1 | < 0.1% |
| 13752 | 1 | < 0.1% |
| 13751 | 1 | < 0.1% |
| 13750 | 1 | < 0.1% |
| 13748 | 1 | < 0.1% |
| Other values (20423) | 20423 |
| Value | Count | Frequency (%) |
| 0 | 1 | |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 |
| Value | Count | Frequency (%) |
| 20639 | 1 | |
| 20638 | 1 | |
| 20637 | 1 | |
| 20636 | 1 | |
| 20635 | 1 | |
| 20634 | 1 | |
| 20633 | 1 | |
| 20632 | 1 | |
| 20631 | 1 | |
| 20630 | 1 |
| Distinct | 844 |
|---|---|
| Distinct (%) | 4.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | -119.5706886 |
| Minimum | -124.35 |
|---|---|
| Maximum | -114.31 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 20433 |
| Negative (%) | 100.0% |
| Memory size | 159.8 KiB |
Quantile statistics
| Minimum | -124.35 |
|---|---|
| 5-th percentile | -122.47 |
| Q1 | -121.8 |
| median | -118.49 |
| Q3 | -118.01 |
| 95-th percentile | -117.08 |
| Maximum | -114.31 |
| Range | 10.04 |
| Interquartile range (IQR) | 3.79 |
Descriptive statistics
| Standard deviation | 2.003577891 |
|---|---|
| Coefficient of variation (CV) | -0.01675643014 |
| Kurtosis | -1.332548154 |
| Mean | -119.5706886 |
| Median Absolute Deviation (MAD) | 1.29 |
| Skewness | -0.2961409006 |
| Sum | -2443187.88 |
| Variance | 4.014324364 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| -118.31 | 159 | 0.8% |
| -118.3 | 157 | 0.8% |
| -118.29 | 146 | 0.7% |
| -118.27 | 141 | 0.7% |
| -118.32 | 141 | 0.7% |
| -118.28 | 139 | 0.7% |
| -118.35 | 138 | 0.7% |
| -118.36 | 135 | 0.7% |
| -118.19 | 134 | 0.7% |
| -118.25 | 126 | 0.6% |
| Other values (834) | 19017 |
| Value | Count | Frequency (%) |
| -124.35 | 1 | < 0.1% |
| -124.3 | 2 | < 0.1% |
| -124.27 | 1 | < 0.1% |
| -124.26 | 1 | < 0.1% |
| -124.25 | 1 | < 0.1% |
| -124.23 | 3 | |
| -124.22 | 1 | < 0.1% |
| -124.21 | 3 | |
| -124.19 | 4 | |
| -124.18 | 6 |
| Value | Count | Frequency (%) |
| -114.31 | 1 | < 0.1% |
| -114.47 | 1 | < 0.1% |
| -114.49 | 1 | < 0.1% |
| -114.55 | 1 | < 0.1% |
| -114.56 | 1 | < 0.1% |
| -114.57 | 3 | |
| -114.58 | 2 | |
| -114.59 | 1 | < 0.1% |
| -114.6 | 3 | |
| -114.61 | 3 |
| Distinct | 861 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 35.63322126 |
| Minimum | 32.54 |
|---|---|
| Maximum | 41.95 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 159.8 KiB |
Quantile statistics
| Minimum | 32.54 |
|---|---|
| 5-th percentile | 32.82 |
| Q1 | 33.93 |
| median | 34.26 |
| Q3 | 37.72 |
| 95-th percentile | 38.96 |
| Maximum | 41.95 |
| Range | 9.41 |
| Interquartile range (IQR) | 3.79 |
Descriptive statistics
| Standard deviation | 2.136347666 |
|---|---|
| Coefficient of variation (CV) | 0.05995381812 |
| Kurtosis | -1.119522552 |
| Mean | 35.63322126 |
| Median Absolute Deviation (MAD) | 1.23 |
| Skewness | 0.464934277 |
| Sum | 728093.61 |
| Variance | 4.563981352 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 34.06 | 241 | 1.2% |
| 34.08 | 232 | 1.1% |
| 34.05 | 229 | 1.1% |
| 34.07 | 227 | 1.1% |
| 34.04 | 215 | 1.1% |
| 34.09 | 209 | 1.0% |
| 34.02 | 207 | 1.0% |
| 34.1 | 201 | 1.0% |
| 34.03 | 189 | 0.9% |
| 33.93 | 181 | 0.9% |
| Other values (851) | 18302 |
| Value | Count | Frequency (%) |
| 32.54 | 1 | < 0.1% |
| 32.55 | 3 | < 0.1% |
| 32.56 | 10 | < 0.1% |
| 32.57 | 18 | |
| 32.58 | 26 | |
| 32.59 | 11 | |
| 32.6 | 9 | < 0.1% |
| 32.61 | 14 | |
| 32.62 | 13 | |
| 32.63 | 18 |
| Value | Count | Frequency (%) |
| 41.95 | 2 | |
| 41.92 | 1 | < 0.1% |
| 41.88 | 1 | < 0.1% |
| 41.86 | 3 | |
| 41.84 | 1 | < 0.1% |
| 41.82 | 1 | < 0.1% |
| 41.81 | 2 | |
| 41.8 | 3 | |
| 41.79 | 1 | < 0.1% |
| 41.78 | 3 |
housing_median_age
Real number (ℝ≥0)
| Distinct | 52 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28.63309353 |
| Minimum | 1 |
|---|---|
| Maximum | 52 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 159.8 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 8 |
| Q1 | 18 |
| median | 29 |
| Q3 | 37 |
| 95-th percentile | 52 |
| Maximum | 52 |
| Range | 51 |
| Interquartile range (IQR) | 19 |
Descriptive statistics
| Standard deviation | 12.5918052 |
|---|---|
| Coefficient of variation (CV) | 0.439764051 |
| Kurtosis | -0.8010133431 |
| Mean | 28.63309353 |
| Median Absolute Deviation (MAD) | 10 |
| Skewness | 0.06160542583 |
| Sum | 585060 |
| Variance | 158.5535582 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 52 | 1265 | 6.2% |
| 36 | 856 | 4.2% |
| 35 | 818 | 4.0% |
| 16 | 762 | 3.7% |
| 17 | 694 | 3.4% |
| 34 | 682 | 3.3% |
| 26 | 611 | 3.0% |
| 33 | 609 | 3.0% |
| 25 | 562 | 2.8% |
| 32 | 560 | 2.7% |
| Other values (42) | 13014 |
| Value | Count | Frequency (%) |
| 1 | 4 | < 0.1% |
| 2 | 58 | 0.3% |
| 3 | 62 | 0.3% |
| 4 | 190 | |
| 5 | 242 | |
| 6 | 157 | |
| 7 | 173 | |
| 8 | 203 | |
| 9 | 204 | |
| 10 | 263 |
| Value | Count | Frequency (%) |
| 52 | 1265 | |
| 51 | 47 | 0.2% |
| 50 | 135 | 0.7% |
| 49 | 133 | 0.7% |
| 48 | 174 | 0.9% |
| 47 | 195 | 1.0% |
| 46 | 245 | 1.2% |
| 45 | 286 | 1.4% |
| 44 | 353 | 1.7% |
| 43 | 351 | 1.7% |
| Distinct | 5911 |
|---|---|
| Distinct (%) | 28.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2636.504233 |
| Minimum | 2 |
|---|---|
| Maximum | 39320 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 159.8 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 622 |
| Q1 | 1450 |
| median | 2127 |
| Q3 | 3143 |
| 95-th percentile | 6217 |
| Maximum | 39320 |
| Range | 39318 |
| Interquartile range (IQR) | 1693 |
Descriptive statistics
| Standard deviation | 2185.269567 |
|---|---|
| Coefficient of variation (CV) | 0.8288511505 |
| Kurtosis | 32.7138594 |
| Mean | 2636.504233 |
| Median Absolute Deviation (MAD) | 795 |
| Skewness | 4.158816423 |
| Sum | 53871691 |
| Variance | 4775403.08 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1527 | 18 | 0.1% |
| 1582 | 17 | 0.1% |
| 1613 | 17 | 0.1% |
| 2127 | 16 | 0.1% |
| 1471 | 15 | 0.1% |
| 2053 | 15 | 0.1% |
| 1607 | 15 | 0.1% |
| 1722 | 15 | 0.1% |
| 1717 | 15 | 0.1% |
| 1703 | 15 | 0.1% |
| Other values (5901) | 20275 |
| Value | Count | Frequency (%) |
| 2 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 8 | 1 | < 0.1% |
| 11 | 1 | < 0.1% |
| 12 | 1 | < 0.1% |
| 15 | 2 | |
| 16 | 1 | < 0.1% |
| 18 | 4 | |
| 19 | 2 | |
| 20 | 2 |
| Value | Count | Frequency (%) |
| 39320 | 1 | |
| 37937 | 1 | |
| 32627 | 1 | |
| 32054 | 1 | |
| 30450 | 1 | |
| 30405 | 1 | |
| 30401 | 1 | |
| 28258 | 1 | |
| 27870 | 1 | |
| 27700 | 1 |
total_bedrooms
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 1923 |
|---|---|
| Distinct (%) | 9.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 537.8705525 |
| Minimum | 1 |
|---|---|
| Maximum | 6445 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 159.8 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 137 |
| Q1 | 296 |
| median | 435 |
| Q3 | 647 |
| 95-th percentile | 1275.4 |
| Maximum | 6445 |
| Range | 6444 |
| Interquartile range (IQR) | 351 |
Descriptive statistics
| Standard deviation | 421.3850701 |
|---|---|
| Coefficient of variation (CV) | 0.7834321252 |
| Kurtosis | 21.98557506 |
| Mean | 537.8705525 |
| Median Absolute Deviation (MAD) | 162 |
| Skewness | 3.459546332 |
| Sum | 10990309 |
| Variance | 177565.3773 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 280 | 55 | 0.3% |
| 331 | 51 | 0.2% |
| 345 | 50 | 0.2% |
| 343 | 49 | 0.2% |
| 393 | 49 | 0.2% |
| 328 | 48 | 0.2% |
| 348 | 48 | 0.2% |
| 394 | 48 | 0.2% |
| 272 | 47 | 0.2% |
| 309 | 47 | 0.2% |
| Other values (1913) | 19941 |
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 2 | 2 | < 0.1% |
| 3 | 5 | |
| 4 | 7 | |
| 5 | 6 | |
| 6 | 5 | |
| 7 | 6 | |
| 8 | 8 | |
| 9 | 7 | |
| 10 | 8 |
| Value | Count | Frequency (%) |
| 6445 | 1 | |
| 6210 | 1 | |
| 5471 | 1 | |
| 5419 | 1 | |
| 5290 | 1 | |
| 5033 | 1 | |
| 5027 | 1 | |
| 4957 | 1 | |
| 4952 | 1 | |
| 4819 | 1 |
| Distinct | 3879 |
|---|---|
| Distinct (%) | 19.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1424.946949 |
| Minimum | 3 |
|---|---|
| Maximum | 35682 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 159.8 KiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 348 |
| Q1 | 787 |
| median | 1166 |
| Q3 | 1722 |
| 95-th percentile | 3284.4 |
| Maximum | 35682 |
| Range | 35679 |
| Interquartile range (IQR) | 935 |
Descriptive statistics
| Standard deviation | 1133.20849 |
|---|---|
| Coefficient of variation (CV) | 0.7952636348 |
| Kurtosis | 74.06088815 |
| Mean | 1424.946949 |
| Median Absolute Deviation (MAD) | 439 |
| Skewness | 4.960016542 |
| Sum | 29115941 |
| Variance | 1284161.481 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 891 | 25 | 0.1% |
| 1052 | 24 | 0.1% |
| 850 | 24 | 0.1% |
| 1227 | 24 | 0.1% |
| 761 | 24 | 0.1% |
| 825 | 22 | 0.1% |
| 782 | 22 | 0.1% |
| 1005 | 22 | 0.1% |
| 872 | 21 | 0.1% |
| 753 | 21 | 0.1% |
| Other values (3869) | 20204 |
| Value | Count | Frequency (%) |
| 3 | 1 | < 0.1% |
| 5 | 1 | < 0.1% |
| 6 | 1 | < 0.1% |
| 8 | 4 | |
| 9 | 2 | |
| 11 | 1 | < 0.1% |
| 13 | 4 | |
| 14 | 3 | |
| 15 | 2 | |
| 17 | 2 |
| Value | Count | Frequency (%) |
| 35682 | 1 | |
| 28566 | 1 | |
| 16305 | 1 | |
| 16122 | 1 | |
| 15507 | 1 | |
| 15037 | 1 | |
| 13251 | 1 | |
| 12873 | 1 | |
| 12427 | 1 | |
| 12203 | 1 |
| Distinct | 1809 |
|---|---|
| Distinct (%) | 8.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 499.4334655 |
| Minimum | 1 |
|---|---|
| Maximum | 6082 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 159.8 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 125 |
| Q1 | 280 |
| median | 409 |
| Q3 | 604 |
| 95-th percentile | 1159 |
| Maximum | 6082 |
| Range | 6081 |
| Interquartile range (IQR) | 324 |
Descriptive statistics
| Standard deviation | 382.2992259 |
|---|---|
| Coefficient of variation (CV) | 0.7654657774 |
| Kurtosis | 22.094083 |
| Mean | 499.4334655 |
| Median Absolute Deviation (MAD) | 151 |
| Skewness | 3.413850191 |
| Sum | 10204924 |
| Variance | 146152.6981 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 306 | 57 | 0.3% |
| 335 | 56 | 0.3% |
| 282 | 55 | 0.3% |
| 386 | 55 | 0.3% |
| 429 | 54 | 0.3% |
| 284 | 51 | 0.2% |
| 375 | 51 | 0.2% |
| 297 | 51 | 0.2% |
| 278 | 50 | 0.2% |
| 380 | 50 | 0.2% |
| Other values (1799) | 19903 |
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 2 | 3 | < 0.1% |
| 3 | 4 | < 0.1% |
| 4 | 4 | < 0.1% |
| 5 | 7 | |
| 6 | 5 | |
| 7 | 10 | |
| 8 | 8 | |
| 9 | 9 | |
| 10 | 7 |
| Value | Count | Frequency (%) |
| 6082 | 1 | |
| 5358 | 1 | |
| 5189 | 1 | |
| 5050 | 1 | |
| 4930 | 1 | |
| 4855 | 1 | |
| 4769 | 1 | |
| 4616 | 1 | |
| 4490 | 1 | |
| 4372 | 1 |
| Distinct | 12825 |
|---|---|
| Distinct (%) | 62.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.871161601 |
| Minimum | 0.4999 |
|---|---|
| Maximum | 15.0001 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 159.8 KiB |
Quantile statistics
| Minimum | 0.4999 |
|---|---|
| 5-th percentile | 1.60066 |
| Q1 | 2.5637 |
| median | 3.5365 |
| Q3 | 4.744 |
| 95-th percentile | 7.30034 |
| Maximum | 15.0001 |
| Range | 14.5002 |
| Interquartile range (IQR) | 2.1803 |
Descriptive statistics
| Standard deviation | 1.899291249 |
|---|---|
| Coefficient of variation (CV) | 0.4906256687 |
| Kurtosis | 4.943141125 |
| Mean | 3.871161601 |
| Median Absolute Deviation (MAD) | 1.0649 |
| Skewness | 1.644556916 |
| Sum | 79099.445 |
| Variance | 3.60730725 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3.125 | 49 | 0.2% |
| 15.0001 | 48 | 0.2% |
| 2.875 | 46 | 0.2% |
| 4.125 | 44 | 0.2% |
| 2.625 | 44 | 0.2% |
| 3.875 | 41 | 0.2% |
| 3.375 | 38 | 0.2% |
| 4 | 37 | 0.2% |
| 3 | 37 | 0.2% |
| 3.625 | 36 | 0.2% |
| Other values (12815) | 20013 |
| Value | Count | Frequency (%) |
| 0.4999 | 12 | |
| 0.536 | 10 | |
| 0.5495 | 1 | < 0.1% |
| 0.6433 | 1 | < 0.1% |
| 0.6775 | 1 | < 0.1% |
| 0.6825 | 1 | < 0.1% |
| 0.6831 | 1 | < 0.1% |
| 0.696 | 1 | < 0.1% |
| 0.6991 | 1 | < 0.1% |
| 0.7007 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 15.0001 | 48 | |
| 15 | 2 | < 0.1% |
| 14.9009 | 1 | < 0.1% |
| 14.5833 | 1 | < 0.1% |
| 14.4219 | 1 | < 0.1% |
| 14.4113 | 1 | < 0.1% |
| 14.2959 | 1 | < 0.1% |
| 14.2867 | 1 | < 0.1% |
| 13.947 | 1 | < 0.1% |
| 13.8556 | 1 | < 0.1% |
| Distinct | 3833 |
|---|---|
| Distinct (%) | 18.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 206864.4132 |
| Minimum | 14999 |
|---|---|
| Maximum | 500001 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 159.8 KiB |
Quantile statistics
| Minimum | 14999 |
|---|---|
| 5-th percentile | 66260 |
| Q1 | 119500 |
| median | 179700 |
| Q3 | 264700 |
| 95-th percentile | 490560 |
| Maximum | 500001 |
| Range | 485002 |
| Interquartile range (IQR) | 145200 |
Descriptive statistics
| Standard deviation | 115435.6671 |
|---|---|
| Coefficient of variation (CV) | 0.5580257394 |
| Kurtosis | 0.3280374703 |
| Mean | 206864.4132 |
| Median Absolute Deviation (MAD) | 68400 |
| Skewness | 0.9782898909 |
| Sum | 4226860554 |
| Variance | 1.332539324 × 1010 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 500001 | 958 | 4.7% |
| 137500 | 119 | 0.6% |
| 162500 | 116 | 0.6% |
| 112500 | 103 | 0.5% |
| 187500 | 92 | 0.5% |
| 225000 | 91 | 0.4% |
| 350000 | 79 | 0.4% |
| 87500 | 77 | 0.4% |
| 275000 | 65 | 0.3% |
| 150000 | 64 | 0.3% |
| Other values (3823) | 18669 |
| Value | Count | Frequency (%) |
| 14999 | 4 | |
| 17500 | 1 | < 0.1% |
| 22500 | 4 | |
| 25000 | 1 | < 0.1% |
| 26600 | 1 | < 0.1% |
| 26900 | 1 | < 0.1% |
| 27500 | 1 | < 0.1% |
| 28300 | 1 | < 0.1% |
| 30000 | 2 | |
| 32500 | 4 |
| Value | Count | Frequency (%) |
| 500001 | 958 | |
| 500000 | 27 | 0.1% |
| 499100 | 1 | < 0.1% |
| 499000 | 1 | < 0.1% |
| 498800 | 1 | < 0.1% |
| 498700 | 1 | < 0.1% |
| 498600 | 1 | < 0.1% |
| 498400 | 1 | < 0.1% |
| 497600 | 1 | < 0.1% |
| 497400 | 1 | < 0.1% |
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 159.8 KiB |
| <1H OCEAN | |
|---|---|
| INLAND | |
| NEAR OCEAN | |
| NEAR BAY | |
| ISLAND | 5 |
Length
| Max length | 10 |
|---|---|
| Median length | 9 |
| Mean length | 8.063035286 |
| Min length | 6 |
Characters and Unicode
| Total characters | 0 |
|---|---|
| Distinct characters | 0 |
| Distinct categories | 0 ? |
| Distinct scripts | 0 ? |
| Distinct blocks | 0 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | NEAR BAY |
|---|---|
| 2nd row | NEAR BAY |
| 3rd row | NEAR BAY |
| 4th row | NEAR BAY |
| 5th row | NEAR BAY |
Common Values
| Value | Count | Frequency (%) |
| <1H OCEAN | 9034 | |
| INLAND | 6496 | |
| NEAR OCEAN | 2628 | 12.9% |
| NEAR BAY | 2270 | 11.1% |
| ISLAND | 5 | < 0.1% |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| ocean | 11662 | |
| 1h | 9034 | |
| inland | 6496 | |
| near | 4898 | |
| bay | 2270 | 6.6% |
| island | 5 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| No values found. | ||
Most occurring categories
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per category
Most occurring scripts
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per script
Most occurring blocks
| Value | Count | Frequency (%) |
| No values found. | ||
Most frequent character per block
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | ocean_proximity | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | -122.23 | 37.88 | 41.0 | 880.0 | 129.0 | 322.0 | 126.0 | 8.3252 | 452600.0 | NEAR BAY |
| 1 | 1 | -122.22 | 37.86 | 21.0 | 7099.0 | 1106.0 | 2401.0 | 1138.0 | 8.3014 | 358500.0 | NEAR BAY |
| 2 | 2 | -122.24 | 37.85 | 52.0 | 1467.0 | 190.0 | 496.0 | 177.0 | 7.2574 | 352100.0 | NEAR BAY |
| 3 | 3 | -122.25 | 37.85 | 52.0 | 1274.0 | 235.0 | 558.0 | 219.0 | 5.6431 | 341300.0 | NEAR BAY |
| 4 | 4 | -122.25 | 37.85 | 52.0 | 1627.0 | 280.0 | 565.0 | 259.0 | 3.8462 | 342200.0 | NEAR BAY |
| 5 | 5 | -122.25 | 37.85 | 52.0 | 919.0 | 213.0 | 413.0 | 193.0 | 4.0368 | 269700.0 | NEAR BAY |
| 6 | 6 | -122.25 | 37.84 | 52.0 | 2535.0 | 489.0 | 1094.0 | 514.0 | 3.6591 | 299200.0 | NEAR BAY |
| 7 | 7 | -122.25 | 37.84 | 52.0 | 3104.0 | 687.0 | 1157.0 | 647.0 | 3.1200 | 241400.0 | NEAR BAY |
| 8 | 8 | -122.26 | 37.84 | 42.0 | 2555.0 | 665.0 | 1206.0 | 595.0 | 2.0804 | 226700.0 | NEAR BAY |
| 9 | 9 | -122.25 | 37.84 | 52.0 | 3549.0 | 707.0 | 1551.0 | 714.0 | 3.6912 | 261100.0 | NEAR BAY |
Last rows
| df_index | longitude | latitude | housing_median_age | total_rooms | total_bedrooms | population | households | median_income | median_house_value | ocean_proximity | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 20423 | 20630 | -121.32 | 39.29 | 11.0 | 2640.0 | 505.0 | 1257.0 | 445.0 | 3.5673 | 112000.0 | INLAND |
| 20424 | 20631 | -121.40 | 39.33 | 15.0 | 2655.0 | 493.0 | 1200.0 | 432.0 | 3.5179 | 107200.0 | INLAND |
| 20425 | 20632 | -121.45 | 39.26 | 15.0 | 2319.0 | 416.0 | 1047.0 | 385.0 | 3.1250 | 115600.0 | INLAND |
| 20426 | 20633 | -121.53 | 39.19 | 27.0 | 2080.0 | 412.0 | 1082.0 | 382.0 | 2.5495 | 98300.0 | INLAND |
| 20427 | 20634 | -121.56 | 39.27 | 28.0 | 2332.0 | 395.0 | 1041.0 | 344.0 | 3.7125 | 116800.0 | INLAND |
| 20428 | 20635 | -121.09 | 39.48 | 25.0 | 1665.0 | 374.0 | 845.0 | 330.0 | 1.5603 | 78100.0 | INLAND |
| 20429 | 20636 | -121.21 | 39.49 | 18.0 | 697.0 | 150.0 | 356.0 | 114.0 | 2.5568 | 77100.0 | INLAND |
| 20430 | 20637 | -121.22 | 39.43 | 17.0 | 2254.0 | 485.0 | 1007.0 | 433.0 | 1.7000 | 92300.0 | INLAND |
| 20431 | 20638 | -121.32 | 39.43 | 18.0 | 1860.0 | 409.0 | 741.0 | 349.0 | 1.8672 | 84700.0 | INLAND |
| 20432 | 20639 | -121.24 | 39.37 | 16.0 | 2785.0 | 616.0 | 1387.0 | 530.0 | 2.3886 | 89400.0 | INLAND |